Take-home Exercise 5

In this Take-home Exercise, I will explore to reveal the social areas of the city of Engagement, Ohio USA. I will also visualise and analyse locations with traffice bottleneck of the city.

Che Xuan https://www.linkedin.com/in/jacob-che-xuan-b646a9123/
2022-05-29

Task

Challenge 2: Patterns of Life considers the patterns of daily life throughout the city. You will describe the daily routines for some representative people, characterize the travel patterns to identify potential bottlenecks or hazards, and examine how these patterns change over time and seasons.

In Challenge 2, you will use visual analytic techniques to address these questions:

Overview

In this take-home exercise, appropriate visual analytics methods are used to reveal the social areas as well as the locations with traffic bottleneck of the city of Engagement, Ohio USA while addressing the questions stated in the Task section.

The data are processed by using appropriate tidyverse family of packages.

Sketch of Proposed Design

The picture below shows a sketch of the initial design proposed.

Installing & Launching R Packages

Before we get started, it is important for us to ensure that the required R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.

The chunk code below will do the trick.

packages = c('sf', 'tmap', 'tidyverse', 'lubridate', 'clock', 'sftime', 'rmarkdown', 'ggiraph', 'plotly', 'DT', 'patchwork','crosstalk')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

Importing Data

In the code chunk below, read_sf() of sf package is used to parse School.csv, Pubs.csv, Apartments.csv, Buildings.csv, Employer.csv, and Restaurants.csv from the data folder into R as sf data.frames. The 6 files parsed represented the various social areas we would like to explore.

schools <- read_sf("data/wkt/Schools.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")  
pubs <- read_sf("data/wkt/Pubs.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
apartments <- read_sf("data/wkt/Apartments.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
buildings <- read_sf("data/wkt/Buildings.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
employers <- read_sf("data/wkt/Employers.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
restaurants <- read_sf("data/wkt/Restaurants.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")

print(buildings)
Simple feature collection with 1042 features and 4 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -4762.191 ymin: -30.08359 xmax: 2650 ymax: 7850.037
CRS:           NA
# A tibble: 1,042 x 5
   buildingId                       location buildingType maxOccupancy
   <chr>                           <POLYGON> <chr>        <chr>       
 1 1          ((350.0639 4595.666, 390.0633~ Commercial   ""          
 2 2          ((-1926.973 2725.611, -1948.1~ Residental   "12"        
 3 3          ((685.6846 1552.131, 645.9985~ Commercial   ""          
 4 4          ((-976.7845 4542.382, -1053.2~ Commercial   ""          
 5 5          ((1259.306 3572.727, 1299.255~ Residental   "2"         
 6 6          ((478.8969 1082.484, 473.6596~ Commercial   ""          
 7 7          ((-1920.823 615.7447, -1960.8~ Residental   ""          
 8 8          ((-3302.657 5394.354, -3301.5~ Commercial   ""          
 9 9          ((-600.5789 4429.228, -495.95~ Commercial   ""          
10 10         ((-68.75908 5379.924, -28.782~ Residental   "5"         
# ... with 1,032 more rows, and 1 more variable: units <chr>
print(restaurants)
Simple feature collection with 20 features and 4 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -4131.414 ymin: 1194.129 xmax: 1407.711 ymax: 7236.526
CRS:           NA
# A tibble: 20 x 5
   restaurantId foodCost maxOccupancy             location
   <chr>        <chr>    <chr>                     <POINT>
 1 445          5.15     71            (631.5131 2001.477)
 2 446          4.17     82              (413.84 1194.129)
 3 447          5.87     119           (497.9968 1624.515)
 4 448          4.07     98            (698.2411 4392.417)
 5 449          5.11     53            (1407.711 4010.457)
 6 895          4.44     103           (-1623.074 3872.03)
 7 896          5.86     107          (-2126.172 4285.182)
 8 897          5.76     104          (-1989.635 3368.765)
 9 898          4.22     115           (-1771.452 4343.58)
10 899          5.65     85            (-820.929 4497.243)
11 1345         4.7      72           (-1176.801 5846.647)
12 1346         5.85     74           (-3529.558 4809.641)
13 1347         4.41     87           (-1221.405 4941.003)
14 1348         4.23     57           (-508.1269 6307.584)
15 1349         4.77     104          (-1176.699 4726.118)
16 1801         5.51     64           (-3136.128 6671.917)
17 1802         5.08     72           (-4002.934 6671.997)
18 1803         4.35     94           (-3182.886 7167.609)
19 1804         5.92     48           (-4131.414 7236.526)
20 1805         5.59     85             (-3927.91 5494.64)
# ... with 1 more variable: buildingId <chr>

Plot Building Footprint Map

The code chunk below plots the building polygon features by using tm_polygon() of tmap.

tmap_mode("view")
tm_shape(buildings) +
tm_polygons(col = "grey60",
           size = 1,
           border.col = "black",
           border.lwd = 1) +
tm_layout(title= "Building Footprint Map of the City of Engagement")
tmap_mode("plot")

Plotting A Composite Map

The code chunk below is used to plot a composite map by combining the simple feature data.frames of the buildings and the various social areas.

tmap_mode("view") +
tm_shape(buildings)+
tm_polygons(col = "grey60",
           size = 1,
           border.col = "black",
           border.lwd = 1) +
tm_shape(employers) +
  tm_dots(col = "red") +
tm_shape(apartments) +
  tm_dots(col = "lightblue") +
tm_shape(pubs) +
  tm_dots(col = "green") +
tm_shape(restaurants) +
  tm_dots(col = "blue") +
tm_shape(schools) +
  tm_dots(col = "yellow") +
tm_layout(title= "Composite Map of Social Areas of the City of Engagement")
tmap_mode("plot")

We arrive at the following points regarding social areas of the city:

Central Region: Central business district of the city, with greatest number of companies/restaurants/pubs located in the center of this region. The inner region is surrounded by residential apartments.

North-West Region: Mature estate with densely populated apartments as well as its own list of companies. Reasonably numbers of restaurants, pubs and schools are also seen in this region.

South Region & East Region: Non-mature estates with relatively less apartments and companies. The number of restaurants, pubs and schools are proportionally less too.

Importing Data

In the code chunk below, read_sf() of sf package is used to parse ParticipantStatusLogs1.csv from the rawdata folder into R as a simple feature data.frame.

logs <- read_sf("rawdata/ActivityLogs/ParticipantStatusLogs1.csv", 
                options = "GEOM_POSSIBLE_NAMES=currentLocation")

Data Wrangling

logs_selected <- logs %>%
  mutate(Timestamp = date_time_parse(timestamp,
                zone = "",
                format = "%Y-%m-%dT%H:%M:%S")) %>%
  mutate(day = get_day(Timestamp)) %>%
  filter(currentMode == "Transport")

Data frame logs_selected is saved in RDS format to avoid uploading large files to Git.

saveRDS(logs_selected, 'data/logs_selected')
logs_selected <- readRDS('data/logs_selected')

Plotting Hexagon Binning Map

In the code chunk below, st_make_grid() of sf package is used to create hexagons.

hex <- st_make_grid(buildings, 
                    cellsize=100, 
                    square=FALSE) %>%
  st_sf() %>%
  rowid_to_column('hex_id')
plot(hex)

The code chunk below perform point in polygon overlay by using st_join() of sf package.

points_in_hex <- st_join(logs_selected, 
                         hex, 
                         join=st_within)
plot(points_in_hex, pch='.')

In the code chunk below, st_join() of sf package is used to count the number of event points in the hexagons.

points_in_hex <- st_join(logs_selected, 
                        hex, 
                        join=st_within) %>%
  st_set_geometry(NULL) %>%
  count(name='pointCount', hex_id)
head(points_in_hex)
# A tibble: 6 x 2
  hex_id pointCount
   <int>      <int>
1    169         35
2    212         56
3    225         21
4    226         94
5    227         22
6    228         45

In the code chunk below, left_join() of dplyr package is used to perform a left-join by using hex as the target table and points_in_hex as the join table. The join ID is hex_id.

hex_combined <- hex %>%
  left_join(points_in_hex, 
            by = 'hex_id') %>%
  replace(is.na(.), 0)

In the code chunk below, tmap package is used to create the hexagon binning map.

tm_shape(hex_combined %>%
           filter(pointCount > 0))+
  tm_fill("pointCount",
          style = "headtails",
          title="Point Count") +
  tm_borders(alpha = 0.1) +
  tm_layout(main.title = "Hexagon Binning Map of the City of Engagement",
    main.title.position = "center",
    main.title.color = "black") +
  tm_scale_bar(text.size = 0.5,
               color.dark = "black",
               color.light = "white",
               lwd = 1) +
  tm_compass(type="radar",
             north = 0,
             text.size = 0.8,
             show.labels = 3,
             cardinal.directions = c("N", "E", "S", "W"),
             lwd = 1,
             position = c("right", "top"))

The darker-colored points are the busiest areas in Engagement.

The possible traffic bottlenecks (as seen in the figure below) are:

We also notice that these 2 paths are indeed the only paths possible connecting the regions together.

Interactive Scatterplot

In the code chunk below, ggplotly() of plotly and bscols() of crosstalk are used to plot an interactive scatter plot with a table to visualize the number of event points in the hexagons

dd <- highlight_key(points_in_hex)

graph1 <- ggplot(dd, aes(x = hex_id, y = pointCount)) +
  geom_point(size=1) +
  theme(axis.title.y= element_text(angle=0), axis.ticks.x= element_blank(), axis.text.x=element_blank(), axis.line= element_line(color= 'grey')) +
  ggtitle("Distribution of Hex Id") +
  xlab("Hex Id") +
  ylab("Point Count")

gg <- highlight(ggplotly(graph1),
                "plotly_selected")

crosstalk::bscols(gg, 
                  widths = c(12,12),
                  DT::datatable(dd,
                                rownames = FALSE,
                                colnames = c('Hex Id' = 'hex_id',
                                             'Point Count' = 'pointCount'), 
                                filter = 'top',
                                class = 'display'))

Plotting Hexagon Binning Map (AM vs PM)

We will now look at the difference in the Hexagon Binning Map of AM vs that of PM.

tm_shape(hex_combined_am %>%
           filter(pointCount > 0))+
  tm_fill("pointCount",
          style = "headtails",
          title="Point Count") +
  tm_borders(alpha = 0.1) +
  tm_layout(main.title = "Hexagon Binning Map (AM)",
    main.title.position = "center",
    main.title.color = "black") +
  tm_scale_bar(text.size = 0.5,
               color.dark = "black",
               color.light = "white",
               lwd = 1) +
  tm_compass(type="radar",
             north = 0,
             text.size = 0.8,
             show.labels = 2,
             cardinal.directions = c("N", "E", "S", "W"),
             lwd = 1,
             position = c("right", "top"))

tm_shape(hex_combined_pm %>%
           filter(pointCount > 0))+
  tm_fill("pointCount",
          style = "headtails",
          title="Point Count") +
  tm_borders(alpha = 0.1) +
  tm_layout(main.title = "Hexagon Binning Map (PM)",
    main.title.position = "center",
    main.title.color = "black") +
  tm_scale_bar(text.size = 0.5,
               color.dark = "black",
               color.light = "white",
               lwd = 1) +
  tm_compass(type="radar",
             north = 0,
             text.size = 0.8,
             show.labels = 2,
             cardinal.directions = c("N", "E", "S", "W"),
             lwd = 1,
             position = c("right", "top"))

From both maps, we can see that the path joining Central Region to North-West Region is more prominent in AM than in PM. Same pattern is also spotted for the path joining South Region to East Region. This might be because both paths are the main paths for people to travel to work in the morning, but after work in the evening some people may have gone to other places for dinner or gatherings and returned home using other routes thereafter.